Work in Progress
This article is currently a work in progress. Feel free to observe while it gets filled out!
Data were gathered for the years 2015-2022, accessed on 21 Dec 2023, in Excel spreadsheets available from the American Institute of Physics (AIP); via their Roster of Physics Departments with Enrollment and Degree Data. Due to the structure of the data reporting, enrollment data are considered aggregated for the traditional academic year (beginning in the Fall of the previous year and ending in the Spring of the listed year); in addition to an aggregate of all degree recipients, awarded or extant, for the calendar extent (Jan -> Dec) of the listed year.
For example, in the 2015 report, data regarding enrollments are the
finalized numbers of the 2014-2015 academic year; plus all conferred
Bachelors (BS), MS, and PhD
certificates from January through December (inclusive) of the 2015
calendar year.
Datasheets were edited at an intermediary step to unify and
homogenize data condensation into a tidy data set.
Prior to 2017, the
Highest Physics Degree Offered field and some associated
enrollment data was not available in the survey. In our analysis, data
were infilled on the condition that if a MS or
PhD degree certificate was conferred during those years,
then the corresponding program at the appropriate level must have
existed; and if a higher level was available in 2017, then it must have
been available in 2015 or 2016. The Notes annotations field
was removed from this analysis, as the field appeared consistent across
years, and after 2019 split into a separate datasheet of the same annual
report- the Notes values and changelog are available at the
source data.
Institution-level data were adjusted for varying
spellings and canonizations over time (e.g. Appl Phy ->
Appl Phys, Coll of -, etc.). For the purposes
of this analysis, Institutions that changed from a College,
University, or system designation are simply renamed to their name as of
2022, with information lost regarding that the change occurred. For a
complete listing of the modifications made to Institution
names, consult Lines 169-231 and
269-317 in the process used to incorporate this data. The
data pipeline is encoded in a function, process_data(...),
demonstrated in the following R script:
#---- Imports ----#
#
# TODO :> these docs
#
#--------------------------------#
library(tidyverse)
library(readxl)
library(forcats)
#---- process_data(...) ----#
#
# TODO :> these docs
#
#--------------------------------#
process_data <- \(DATA_DIR) {
##########
##-----##
##----##
##---## Read Data
##----##
##-----##
##########
data <- map(
list.files(DATA_DIR, pattern="*.xlsx", full.names=T),
\(.file) {
#####
## Excel Parse
#####
readxl::read_excel(
.file,
sheet = 'data',
col_types = c(
'text', # Institution
'text', # State; 2-chr factor-level
'text', # Highest Degree Offered; 3-level factor of `BS`, `MS`, or `PhD`
'text', # Astro Program; 3-level factor of `combined`, `separate`, or `none`
'text', # Notes
'numeric', # First-Term Introductory Physics Course Enrollments
'numeric', # First-Term Introductory Physical Science and Astronomy Course Enrollments
'numeric', # Fall Junior Enrollments
'numeric', # Fall Senior Enrollments
'numeric', # Fall Total Graduate Student Enrollments
'numeric', # Fall Non-US Graduate Student Enrollments
'numeric', # Fall First-Year Graduate Student Enrollments
'numeric', # Physics Bachelors
'numeric', # Exiting Physics Masters
'numeric' # Physics PhDs
),
na = c('---', ''),
.name_repair = \(cols) { # unify column names
cols |>
gsub('(Fall [1-2]{1}([0-1]|[8-9]){1}([0-9]){1}[0-9]{1})', 'Fall', x = _, perl=TRUE) |>
gsub('(^(20[0-9]{2}\\-[0-9]{2})\\s+)|(\\-)', '', x = _, perl=TRUE) |>
gsub('\\s*(\\w+)\\s+', '\\1_', x = _, perl=TRUE)
#gsub('(First-Term)', 'FirstTerm', x = _, perl=TRUE)
}
) |>
#####
## Denote Year
#####
mutate(
Year = parse_number(
paste0(
'20',
gsub(
paste0(DATA_DIR, "*physrostr{0,1}([0-9]{2}).xlsx$"),
"\\1",
.file))),
.before = Institution
)
}) |>
##########
##-----##
##----##
##---## Process Data
##----##
##-----##
##########
map( \(.tbl) {
#####
## transform `*_Enrollments` fields, added in (TODO:> ???)
#####
if ('Fall_Total_Graduate_Student_Enrollments' %in% names(.tbl) == F)
{ .tbl <- .tbl |> add_column(Fall_Total_Graduate_Student_Enrollments = NA, .name_repair = 'unique') }
if ('Fall_FirstYear_Graduate_Student_Enrollments' %in% names(.tbl) == F)
{ .tbl <- .tbl |> add_column(Fall_FirstYear_Graduate_Student_Enrollments = NA, .name_repair = 'unique') }
if ('Fall_Senior_Enrollments' %in% names(.tbl) == F)
{ .tbl <- .tbl |> add_column(Fall_Senior_Enrollments = NA, .name_repair = 'unique') }
if ('Fall_Junior_Enrollments' %in% names(.tbl) == F)
{ .tbl <- .tbl |> add_column(Fall_Junior_Enrollments = NA, .name_repair = 'unique') }
if ('Fall_NonUS_Graduate_Student_Enrollments' %in% names(.tbl) == F)
{ .tbl <- .tbl |> add_column(Fall_NonUS_Graduate_Student_Enrollments = NA, .name_repair = 'unique') }
if ('Highest_Physics_Degree_Offered' %in% names(.tbl) == F)
{ .tbl <- .tbl |> add_column(Highest_Physics_Degree_Offered = NA, .name_repair = 'unique') }
.tbl |>
#####
## drop unused columns
#####
select(-any_of(c('Notes', 'Highest_Degree_Offered'))) |>
#####
## transform `Highest Physics Degree Offered`, added in 2017
#####
mutate(
Highest_Physics_Degree_Offered = if_else(
is.na(Highest_Physics_Degree_Offered),
if_else(
is.na(Physics_PhDs),
if_else(
is.na(Fall_Total_Graduate_Student_Enrollments),
'BS',
'MS'
),
'PhD'
),
Highest_Physics_Degree_Offered
)) |>
mutate( Highest_Physics_Degree_Offered = as.factor(Highest_Physics_Degree_Offered) ) |>
mutate(
`Highest_Physics_Degree_Offered` = fct_relevel( `Highest_Physics_Degree_Offered`, c('BS','MS','PhD'))
) |>
#####
## transform/convert `Astro Program` into factor
#####
mutate(
Astro_Program = case_when(
Astro_Program == 'c' ~ 'combined',
Astro_Program == 's' ~ 'separate'
)
) |>
mutate(
Astro_Program = as.factor(Astro_Program)
) |>
mutate(
`Astro_Program` = fct_relevel( `Astro_Program`, c('no dept.', 'separate', 'combined'))
) |>
#####
## transform State, Year into factors
#####
mutate(State = as.factor(State)) |>
mutate(Year = as.factor(Year)) |>
#####
## transform `Appl Phy` -> `Appl Phys`
#####
mutate(
Institution = gsub("(\\(Appl Phy\\))", "\\(Appl Phys\\)", Institution)
) |>
#####
## transform Institution name `College` -> `Coll`, drop apostrophe
#####
mutate(
Institution = gsub(
"((College(s){0,1})(\\s+(of)){0,1}\\s*\\w{0})$", "Coll \\4",
Institution,
perl = TRUE
)
) |>
mutate(
Institution = Institution |>
gsub("'", '', x = _) |>
gsub("*", '', x = _) |>
trimws(which = "both") |>
str_squish()
) |>
#####
## transform Institution names for continuity
#####
mutate(
Institution = Institution |>
gsub("(Coll\\.)", "Coll", x=_) |>
gsub("(\\*)", "", x=_) |>
gsub("(Maryland-U of, Coll Park)", "Maryland-U of, College Park", x=_) |>
gsub("(Minnesota-U of, Minnpls)", "Minnesota-U of, Minnpls/TwinCities", x=_) |>
gsub("(Minnesota-U of, Twin Cities)", "Minnesota-U of, Minnpls/TwinCities", x=_) |>
gsub("(Mary Baldwin Coll)", "Mary Baldwin U", x=_) |>
gsub("(Piedmont Coll)", "Piedmont U", x=_) |>
gsub("(William & Mary-Coll of)", "William & Mary", x=_) |>
gsub("(SUNY Coll at Brockport)", "SUNY Brockport", x=_) |>
gsub("(Notre Dame of MD-Coll of)", "Notre Dame of MD U", x=_) |>
gsub("(Fresno State U)", "Cal St U-Fresno", x=_) |>
gsub("(Muskingum Coll)", "Muskingum U", x=_) |>
gsub("(Central Methodist Coll)", "Central Methodist U", x=_) |>
gsub("(Indiana U Purdue U-Ft Wayne)", "Purdue U-Ft Wayne", x=_) |>
gsub("(Purdue U-Calumet)", "Purdue U-Northwest", x=_) |>
gsub("(Armstrong Atlantic St U)", "Armstrong State U", x=_) |>
gsub("(Armstrong State U)", "Georgia Southern U", x=_) |>
gsub("(Lynchburg Coll)", "Lynchburg-U of", x=_) |>
gsub("(St. John Fisher Coll)", "St. John Fisher U", x=_) |>
gsub("(Greenville Coll)", "Greenville U", x=_) |>
gsub("(Bloomsburg U of PA)", "Commonwealth U of PA", x=_) |>
gsub("(Roberts Wesleyan Coll)", "Roberts Wesleyan U", x = _) |>
gsub("(Doane Coll)", "Doane U", x = _) |>
gsub("(Simmons Coll)", "Simmons U", x = _) |>
gsub("(Thomas More Coll)", "Thomas More U", x = _) |>
gsub("(Linfield Coll)", "Linfield U", x = _) |>
gsub("(Dordt Coll)", "Dordt U", x = _) |>
gsub("(Otterbein Coll)", "Otterbein U", x = _) |>
gsub("(Messiah Coll)", "Messiah U", x = _) |>
gsub("(Sacramento State U)", "Cal St U-Sacramento", x = _) |>
gsub("(Pennsylvania St U-Erie)", "Pennsylvania St Behrend", x = _) |>
gsub("(New York U, Polytechnic Sch. of Eng.)", "New York U, Tandon Sch. of Engrg.", x = _) |>
gsub("(Calvin Coll)", "Calvin U", x = _) |>
gsub("(Augusta State U)", "Augusta U", x = _) |>
gsub("(Elmhurst Coll)", "Elmhurst U", x = _) |>
gsub("(Moravian Coll)", "Moravian U", x = _) |>
gsub("(Augsburg Coll)", "Augsburg U", x = _) |>
gsub("(Centre Coll of KY)", "Centre Coll", x = _) |>
gsub("(Humboldt State U)", "Cal St Poly U-Humboldt", x = _) |>
gsub("(Texas State U-San Marcos)", "Texas State U", x = _) |>
gsub("(Richard Stockton Coll of NJ)", "Stockton U", x = _) |>
gsub("(Chatham Coll)", "Chatham U", x = _) |>
gsub("(The Sciences of Philadelphia-U of)", "The Sciences-U of", x = _) |>
gsub("(Baldwin-Wallace Coll)", "Baldwin-Wallace U", x = _) |>
gsub("(St\\. Catherine-Coll of)", "St. Catherine U", x = _) |>
gsub("(Walla Walla Coll)", "Walla Walla U", x = _) |>
gsub("(New York U \\(NYU\\))", "New York U, School of Arts & Science", x = _) |>
gsub("(Engrg\\.g\\.)", "Engrg.", x = _) |>
gsub("(Whitworth Coll)", "Whitworth U", x = _) |>
gsub("(King Coll)", "King U", x = _) |>
gsub("(NJIT/Rutgers U-Newark)", "New Jersey Inst of Tech", x = _) |>
gsub("(Rutgers U-Newark/NJIT)", "Rutgers U-Newark", x = _) |>
gsub("(St\\. Peters Coll)", "St. Peters U", x = _) |>
gsub("(MO-U of, Rolla)", "Missouri U of Sci & Tech", x = _) |>
gsub("(Metropolitan St Coll of Denver)", "Metropolitan St U of Denver", x = _) |>
gsub("(Mesa State Coll)", "Colorado Mesa U", x = _) |>
gsub("(Southern Polytechnic St U)", "Kennesaw State U", x = _) |>
gsub("(Bridgewater State Coll)", "Bridgewater State U", x = _) |>
gsub("(W\\. Virginia Wesleyan Coll)", "West Virginia Wesleyan Coll", x = _) |>
gsub("(Cal Poly St U-San L\\.O\\.)", "Cal Poly St U-San Luis Obispo", x = _) |>
gsub("(Elon Coll)", "Elon U", x = _) |>
gsub("(Manchester Coll)", "Manchester U", x = _) |>
gsub("(Utah Valley State Coll)", "Utah Valley U", x = _) |>
gsub("(Mount Union Coll)", "Mount Union-U of", x = _) |>
gsub("(Albertson Coll of Idaho)", "Coll of Idaho", x = _) |>
gsub("(Cal St U-Hayward)", "Cal St U-East Bay", x = _) |>
gsub("(Point Loma Nazarene Coll)", "Point Loma Nazarene U", x = _) |>
gsub("(Colorado St U, Fort Collins)", "Colorado St U-Fort Collins", x = _) |>
gsub("(Colorado State U)", "Colorado St U-Fort Collins", x = _) |>
gsub("(Yale U \\(Appl Sci\\))", "Yale U (Appl Phys)", x = _) |>
gsub("(West Georgia-State U of)", "West Georgia-U of", x = _) |>
gsub("(North Georgia Coll & St U)", "North Georgia-U of", x = _) |>
gsub("(Notre Dame-Coll of, MD)", "Notre Dame of MD U", x = _) |>
gsub("(Cumberland Coll)", "Cumberlands-U of the", x = _) |>
gsub("(Missouri Southern St Coll)", "Missouri Southern St U", x = _) |>
gsub("(Missouri St U)", "Missouri State U", x = _) |>
gsub("(Central Missouri State U)", "Central Missouri-U of", x = _) |>
gsub("(Southwest Missouri State U)", "Missouri State U", x = _) |>
gsub("(Georgian Court Coll)", "Georgian Court U", x = _) |>
gsub("(OK-U of Sci and Arts)", "Sci and Arts of OK-U of", x = _) |>
gsub("(Southwest Texas St U)", "Texas State U", x = _) |>
gsub("(Houston-U of-Downtown)", "Houston-U of, Downtown", x = _) |>
gsub("(Mary Washington Coll)", "Mary Washington-U of", x = _) |>
gsub("(Virginia Tech)", "Virginia Polytech Inst & St U", x = _) |>
gsub("(Randolph-Macon Womans Coll)", "Randolph Coll", x = _)
) |>
mutate(
Institution = case_when(
Institution == "Augustana Coll" & State == "SD" ~ "Augustana U",
Institution == "Xavier U" & State == "LA" ~ "Xavier U of Louisiana",
Institution == "Union Coll" & State == "NY" ~ "Union Coll (NY)",
Institution == "Union Coll" & State == "NE" ~ "Union Coll (NE)",
Institution == "Westminster Coll" & State == "PA" ~ "Westminster Coll (PA)",
Institution == "Westminster Coll" & State == "UT" ~ "Westminster Coll (UT)",
Institution == "Westminster Coll" & State == "MO" ~ "Westminster Coll (MO)",
Institution == "St. Thomas-U of" & State == "MN" ~ "St. Thomas-U of (MN)",
Institution == "St. Thomas-U of" & State == "TX" ~ "St. Thomas-U of (TX)",
Institution == "Wheaton Coll" & State == "IL" ~ "Wheaton Coll (IL)",
Institution == "Wheaton Coll" & State == "MA" ~ "Wheaton Coll (MA)",
Institution == "Embry-Riddle Aeronautical U" & State == "FL" ~ "Embry-Riddle Aeronautical U (FL)",
Institution == "Embry-Riddle Aeronautical U" & State == "AZ" ~ "Embry-Riddle Aeronautical U (AZ)",
Institution == "Georgetown U" & State == "KY" ~ "Georgetown Coll",
Institution == "Lincoln U" & State == "MO" ~ "Lincoln U (MO)",
Institution == "Lincoln U" & State == "PA" ~ "Lincoln U (PA)",
Institution == "Bethel Coll" & State == "MN" ~ "Bethel U",
Institution == "St. Johns U" & State == "MN" ~ "Coll of St. Benedict / St. Johns U",
Institution == "Loyola Coll" & State == "MD" ~ "Loyola U of MD",
.default = Institution
)
) |>
## remove `TN-U of, Space Inst`, dupl by `TN-U of, Knoxville`, the host Inst.
#filter(!(Institution == "TN-U of, Space Inst")) |>
#####
## set column order
#####
relocate(
`Year`,
`Institution`,
`State`,
`Highest_Physics_Degree_Offered`,
`Fall_Total_Graduate_Student_Enrollments`,
`Physics_PhDs`,
`Exiting_Physics_Masters`,
`Fall_FirstYear_Graduate_Student_Enrollments`,
`Physics_Bachelors`,
`Fall_Senior_Enrollments`,
`Fall_Junior_Enrollments`,
`FirstTerm_Introductory_Physics_Course_Enrollments`,
`FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments`,
`Fall_NonUS_Graduate_Student_Enrollments`,
`Astro_Program`
) |>
select(
`Year`,
`Institution`,
`State`,
`Highest_Physics_Degree_Offered`,
`Fall_Total_Graduate_Student_Enrollments`,
`Physics_PhDs`,
`Exiting_Physics_Masters`,
`Fall_FirstYear_Graduate_Student_Enrollments`,
`Physics_Bachelors`,
`Fall_Senior_Enrollments`,
`Fall_Junior_Enrollments`,
`FirstTerm_Introductory_Physics_Course_Enrollments`,
`FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments`,
`Fall_NonUS_Graduate_Student_Enrollments`,
`Astro_Program`
)
}) |>
##########
##-----##
##----##
##---## Save Data
##----##
##-----##
##########
## unify observation collection
list_rbind() |>
## group observations of Institution by Year, then by State
group_by(State, Institution, Year) |>
## sort-asc within previous group by quantity
arrange(Physics_PhDs, .by_group = TRUE)
##########
##-----##
##----##
##---## Targeted Adjustments
##----##
##-----##
##########
## `Georgia Southern U`
gdata <-
data |>
filter(Institution == "Georgia Southern U")
data <- anti_join(data, gdata, by = 'Institution')
gdata <-
gdata |>
ungroup() |>
group_by(Year) |>
summarise(
`Year` = Year,
`Institution` = Institution,
`State` = State,
`Highest_Physics_Degree_Offered` = Highest_Physics_Degree_Offered,
`Fall_Total_Graduate_Student_Enrollments` = sum(Fall_Total_Graduate_Student_Enrollments),
`Physics_PhDs` = sum(Physics_PhDs),
`Exiting_Physics_Masters` = sum(Exiting_Physics_Masters),
`Fall_FirstYear_Graduate_Student_Enrollments` = sum(Fall_FirstYear_Graduate_Student_Enrollments),
`Physics_Bachelors` = sum(Physics_Bachelors),
`Fall_Senior_Enrollments` = sum(Fall_Senior_Enrollments),
`Fall_Junior_Enrollments` = sum(Fall_Junior_Enrollments),
`FirstTerm_Introductory_Physics_Course_Enrollments` = sum(FirstTerm_Introductory_Physics_Course_Enrollments),
`FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments` = sum(FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments),
`Fall_NonUS_Graduate_Student_Enrollments` = sum(Fall_NonUS_Graduate_Student_Enrollments),
`Astro_Program` = Astro_Program
) |>
distinct()
data <- data |> ungroup()
data <- full_join(data, gdata)
## `UT-Brownsville` + `UT-PanAm` -> `UT-RioGrandeValley`
gdata <-
data |>
filter(Institution %in% c(
"Texas-U of, at Brownsville",
"Texas-U of, Pan American",
"Texas-U of, Rio Grande Valley"
))
data <- anti_join(data, gdata, by = 'Institution')
gdata <-
gdata |>
ungroup() |>
mutate(
Institution = "Texas-U of, Rio Grande Valley",
`Highest_Physics_Degree_Offered` = case_when(
`Highest_Physics_Degree_Offered` == 'BS' ~ 'MS',
.default = `Highest_Physics_Degree_Offered`
)
) |>
group_by(Year) |>
summarise(
`Year` = Year,
`Institution` = Institution,
`State` = State,
`Highest_Physics_Degree_Offered` = Highest_Physics_Degree_Offered,
`Fall_Total_Graduate_Student_Enrollments` = sum(Fall_Total_Graduate_Student_Enrollments),
`Physics_PhDs` = sum(Physics_PhDs),
`Exiting_Physics_Masters` = sum(Exiting_Physics_Masters),
`Fall_FirstYear_Graduate_Student_Enrollments` = sum(Fall_FirstYear_Graduate_Student_Enrollments),
`Physics_Bachelors` = sum(Physics_Bachelors),
`Fall_Senior_Enrollments` = sum(Fall_Senior_Enrollments),
`Fall_Junior_Enrollments` = sum(Fall_Junior_Enrollments),
`FirstTerm_Introductory_Physics_Course_Enrollments` = sum(FirstTerm_Introductory_Physics_Course_Enrollments),
`FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments` = sum(FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments),
`Fall_NonUS_Graduate_Student_Enrollments` = sum(Fall_NonUS_Graduate_Student_Enrollments),
`Astro_Program` = Astro_Program
) |>
distinct()
data <- data |> ungroup()
data <- full_join(data, gdata)
## `Mansfield U` + `Lock Haven U` + `Bloomsburg U` -> `Commonwealth U of PA`
gdata <-
data |>
filter(Institution %in% c(
"Mansfield U",
"Lock Haven U",
"Bloomsburg U",
"Commonwealth U of PA"
))
data <- anti_join(data, gdata, by = 'Institution')
gdata <-
gdata |>
ungroup() |>
mutate(
Institution = "Commonwealth U of PA",
) |>
group_by(Year) |>
summarise(
`Year` = Year,
`Institution` = Institution,
`State` = State,
`Highest_Physics_Degree_Offered` = Highest_Physics_Degree_Offered,
`Fall_Total_Graduate_Student_Enrollments` = sum(Fall_Total_Graduate_Student_Enrollments),
`Physics_PhDs` = sum(Physics_PhDs),
`Exiting_Physics_Masters` = sum(Exiting_Physics_Masters),
`Fall_FirstYear_Graduate_Student_Enrollments` = sum(Fall_FirstYear_Graduate_Student_Enrollments),
`Physics_Bachelors` = sum(Physics_Bachelors),
`Fall_Senior_Enrollments` = sum(Fall_Senior_Enrollments),
`Fall_Junior_Enrollments` = sum(Fall_Junior_Enrollments),
`FirstTerm_Introductory_Physics_Course_Enrollments` = sum(FirstTerm_Introductory_Physics_Course_Enrollments),
`FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments` = sum(FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments),
`Fall_NonUS_Graduate_Student_Enrollments` = sum(Fall_NonUS_Graduate_Student_Enrollments),
`Astro_Program` = Astro_Program
) |>
distinct()
data <- data |> ungroup()
data <- full_join(data, gdata)
## Group and Sort
data <- data |>
## group observations of Institution by Year, then by State
group_by(State, Institution, Year) |>
## sort-asc within previous group by quantity
arrange(Physics_PhDs, .by_group = TRUE)
##########
##-----##
##----##
##---## Return
##----##
##-----##
##########
data
}Columns have summary metrics generated for groupings of observations,
given an Institution over a set of Years.
These have been marked as Accumulative or
Averaged. Data at the individual observation level
(foreach Institution, foreach Year)
have not yet been altered from their original
values - with the exception of Georgia Southern U, which
combined with another institution in 2018.
library(tidyverse)
library(reactable)
library(ggplot2)
library(plotly)
library(htmltools)
library(crosstalk)
DATA_DIR = './data/xlsx_edit/'
source('./process_data.R', local = knitr::knit_global())
##########
##-----##
##----##
##---## Data Table
##----##
##-----##
##########
rosterdata <- process_data(DATA_DIR);
tdata <- SharedData$new(rosterdata);
#####
## Generate Reactable data table
#####
rxtbl <- tdata |>
reactable(
####
## main table
####
elementId = 'rosterphys02_22-tbl',
filterable = T,
searchable = T,
groupBy = c('Institution'),
bordered = T,
#striped = T,
highlight = T,
compact = T,
fullWidth = T,
pagination = F,
showPageSizeOptions = T,
pageSizeOptions = c(6, 12, 18),
paginationType = "jump",
selection = "multiple",
onClick = "select",
height = 768,
rowStyle = JS("
function(rowInfo, state) {
if (!rowInfo) return;
// style nested rows
if (rowInfo.level > 0) {
return { background: '#eee', borderLeft: '2px solid #ffa62d' }
} else {
return { borderLeft: '2px solid transparent' }
}
}
"),
#details = function(idx) {
# d <- rosterdata |> filter(Institution == rosterdata$Institution[idx])
# htmltools::div(style = "padding: 1rem",
# reactable(
# d,
# outlined = TRUE
# )
# )
#},
defaultColDef = colDef(
header = function(value) gsub(".", " ", value, fixed = TRUE),
headerClass = "sticky tbl-header",
cell = function(value) format(value, nsmall = 1),
filterable = F,
align = "center",
minWidth = 120,
headerStyle = list(background = "#f7f7f8"),
vAlign = 'center',
headerVAlign = 'bottom',
format = colFormat(
separators = T,
digits = 0
)
),
defaultSorted = list(
Year = 'asc',
Physics_PhDs = 'desc',
Physics_Bachelors = 'desc'
),
defaultPageSize = 6,
minRows = 4,
####
## Per-Column defns
####
columns = list(
`Year` = colDef(
name = 'Year',
align = "center",
minWidth = 64,
sticky = "left",
sortable = T,
defaultSortOrder = "asc"
),
`Institution` = colDef(
name = 'Institution',
align = 'left',
minWidth = 240,
sticky = "left",
filterable = T
),
`State` = colDef(
name = 'State',
minWidth = 64,
aggregate = 'unique',
filterable = T
),
`Highest_Physics_Degree_Offered` = colDef(
name = "Highest Physics Degree Offered",
aggregate = "unique",
),
`Fall_Total_Graduate_Student_Enrollments` = colDef(
name = "Total Grad Student Enrollment (Fall)\n (Averaged)",
aggregate = "mean",
),
`Physics_PhDs` = colDef(
name = "Physics PhDs\n (Accum)",
aggregate = "sum",
format = colFormat(digits=0)
),
`Exiting_Physics_Masters` = colDef(
name = "Exiting Physics Masters\n (Accum)",
aggregate = "sum",
format = colFormat(digits=0)
),
`Fall_FirstYear_Graduate_Student_Enrollments` = colDef(
name = "First-Year Grad Student Enrollment (Fall)\n (Averaged)",
aggregate = "mean",
),
`Physics_Bachelors` = colDef(
name = "Physics Bachelors\n (Accum)",
aggregate = "sum",
format = colFormat(digits=0)
),
`Fall_Senior_Enrollments` = colDef(
name = "Senior Enrollment (Fall)\n (Averaged)",
aggregate = "mean",
),
`Fall_Junior_Enrollments` = colDef(
name = "Junior Enrollment (Fall)\n (Averaged)",
aggregate = "mean",
),
`FirstTerm_Introductory_Physics_Course_Enrollments` = colDef(
name = "First-Term Intro Physics Course Enrollment\n (Averaged)",
aggregate = "mean",
),
`FirstTerm_Introductory_Physical_Science_and_Astronomy_Course_Enrollments` = colDef(
name = "First-Term Intro Phys. Sci. and Astro. Course Enrollment\n (Averaged)",
aggregate = "mean",
),
`Fall_NonUS_Graduate_Student_Enrollments` = colDef(
name = "Non-US Grad Student Enrollment (Fall)\n (Averaged)",
aggregate = "mean",
),
`Astro_Program` = colDef(
name = "Astro Program?",
minWidth = 80,
align = "right",
aggregate = JS("
function(values, rows) {
return(values
.filter((v) => v == 'combined' || v == 'separate')
.map((v) => v == 'combined' ? 'c' : 's')
.reduce(function (acc, curr) {
if (!acc.includes(curr))
acc.push(curr);
return acc;
}, [])
.join(', '));
}
")
)
)
)
#####
## Display Shiny with Crosstalk/htmltools widgets
#####
shiny::fluidPage(
shiny::fluidRow(
shiny::column(
2,
htmltools::browsable(
tagList(
tags$div(
class = 'd-grid gap-2 mx-auto',
# CSV download button
tags$button(
tagList(fontawesome::fa("download"), "\tCSV Data"),
class = 'btn btn-outline-success',
onclick = "Reactable.downloadDataCSV('rosterphys02_22-tbl', 'rosterphys02_22.csv')"
),
# Expand/Collapse button
tags$button(
"Expand/Collapse\nRows",
class = 'btn btn-info',
onclick = "Reactable.toggleAllRowsExpanded('rosterphys02_22-tbl')",
),
)
)
),
# filter-by degree
#filter_checkbox("degree", "Degree Level", tdata, ~Highest_Physics_Degree_Offered),
# TODO :> allow column visibility selection in this menu
# TODO :> allow column stickiness toggle in this menu
# Group Selection
htmltools::browsable(
tagList(
tags$div(
class = "p-2",
tags$b(
tags$label("Group By", `for` = "rosterphys02_22-grp_select"),
),
tags$ul(
id = "rosterphys02_22-grp_select",
class = "list-group",
tags$li(
class = "list-group-item",
tags$input(
class = "form-check-input me-1",
type = "checkbox",
value = "State",
id = "grp_select-state",
onchange = "Reactable.setGroupBy(
'rosterphys02_22-tbl',
[...document.querySelectorAll('#rosterphys02_22-grp_select li input[type=checkbox]')]
.map((e) => e.checked ? e.value : false)
.filter((w) => w !== false)
)"
),
tags$label(
"State",
class = "form-check-label",
`for` = "grp_select-state",
)
),
tags$li(
class = "list-group-item",
tags$input(
class = "form-check-input me-1",
type = "checkbox",
value = "Highest_Physics_Degree_Offered",
id = "grp_select-program",
onchange = "Reactable.setGroupBy(
'rosterphys02_22-tbl',
[...document.querySelectorAll('#rosterphys02_22-grp_select li input[type=checkbox]')]
.map((e) => e.checked ? e.value : false)
.filter((w) => w !== false)
)"
),
tags$label(
"Program",
class = "form-check-label",
`for` = "grp_select-program",
)
),
tags$li(
class = "list-group-item",
tags$input(
class = "form-check-input me-1",
type = "checkbox",
value = "Year",
id = "grp_select-year",
onchange = "Reactable.setGroupBy(
'rosterphys02_22-tbl',
[...document.querySelectorAll('#rosterphys02_22-grp_select li input[type=checkbox]')]
.map((e) => e.checked ? e.value : false)
.filter((w) => w !== false)
)"
),
tags$label(
"Year",
class = "form-check-label",
`for` = "grp_select-year",
)
),
tags$li(
class = "list-group-item",
tags$input(
class = "form-check-input me-1",
type = "checkbox",
value = "Institution",
id = "grp_select-inst",
checked = NA,
onchange = "Reactable.setGroupBy(
'rosterphys02_22-tbl',
[...document.querySelectorAll('#rosterphys02_22-grp_select li input[type=checkbox]')]
.map((e) => e.checked ? e.value : false)
.filter((w) => w !== false)
)"
),
tags$label(
"Institution",
class = "form-check-label",
`for` = "grp_select-inst",
)
),
)
),
)
),
),
shiny::column(
10,
# DataTable
rxtbl
)
),
shiny::fluidRow(
# TODO :> add line-graph over time here using selected data from table
)
)Multiple notes arise from a brief overview of this collection:
## [1] "TODO:> ."
Institutions in this set is in the categorical division of
Highest Physics Degree Offered (BS,
MS, and PhD) in the most recent year of the
roster publication. We will return to this Bayesian decision boundary of
the search space later. Furthermore, we will consider
PhD-granting Institutions as separate from the
combined group of BS/MS-granting
Institutions.## `Institutions offering a PhD program as of 2022
phds_2022 <-
rosterdata |>
filter(
(
Highest_Physics_Degree_Offered == 'PhD' &
Year == 2022
)
) |>
ungroup() |>
select(State, Institution) |>
distinct() |>
group_by(State)
# TODO :> Use reactable alongside density heatmap at State/Region level
phds_2022There is at least one Institution in every state and
territory, except the US Virgin Islands, that has a Physics PhD
program.
There are 209 unique Institutions
offering a PhD as of 2022.
## [1] "TODO:> ."
The University of Minnesota (MN) - Minneapolis offered a
PhD through 2015-2019 (the past extent of
the dataset), but is no longer part of the reported data set. However,
the University of Minnesota - Twin Cities in
2020-Current, despite having no data prior to 2020,
produced similar numbers to the prior Institution. These two should be
merged for the purposes of this analysis, as if the Physics department
is continuous through these years across both Institutions. Similar
behavior occurs in a handful of other institutions.
TODO :> Multiple universities also changed their
program designation, with some recently adding or dropping their
PhD programs, or morphing their MS programs
into a PhD.
## [1] "TODO:> ."
TODO
Institutions of this type (for our analysis) exist: specifically
Stanford, Yale, Harvard,
Univ. Of Michigan - Ann Arbor, Columbia,
Cornell, and Rice Universities.
New York U also has this split between its College of
Liberal Arts and Sciences and an Engineering College, but the
distinction only appears at the BS/MS level starting in
20??, while Kettering U has an Applied Physics program at
the BS level. UC-Davis had an Applied program
that stopped reporting in 2010.Physics PhDs## [1] "TODO:> ."
CA (1926), NY
(1154), MA (1105), TX (907), IL
(700), FL (700), PA (623), OH
(599), MI (530), and CO (419). We should
ensure we perform a population-corrected analysis of the per-capita rate
of completing a PhD in these states. Interestingly, this ranking is
disjoint from the same ordering for the production of Physics Bachelors.
Yet another ranking is produced when ordered by descending quantity of
the three degree levels (BS, MS, and
PhD).Physics_PhDsPhD, alongside accumulative
BS degrees conferred are:Institution |
Physics PhDs |
Physics Bachelors |
|---|---|---|
CA-U of, Berkeley |
286 | 931 |
IL-U of, Urbana/Champaign |
270 | 1129 |
Colorado-U of, Boulder |
263 | 663 |
Harvard U |
262 | 423 |
Ohio State U |
252 | 586 |
Maryland-U of, Coll Park |
241 | 576 |
Stanford U |
225 | 197 |
Texas-U of, at Austin |
186 | 635 |
Mass Inst of Tech (MIT) |
184 | 376 |
Cornell U |
182 | 368 |
Chicago-U of |
181 | 449 |
WI-U of, Madison |
181 | 387 |
Michigan-U of, Ann Arbor |
180 | 439 |
CA-U of, Los Angeles |
170 | 644 |
SUNY-Stony Brook |
169 | 423 |
CA-U of, San Diego |
169 | 405 |
Texas A&M-College Station |
168 | 265 |
Washington-U of |
163 | 1286 |
Minnesota-U of, TwinCities |
163 | 416 |
Princeton U |
161 | 203 |
TODO :> How many departments inconsistently
reported data?
TODO :> How many departments ceased reporting per
each year, any departments end their PhD or BS
program?
TODO :> COVID-19 impacts?
TODO :> Frequency Distribution of accumulative
PhDs
TODO :> Top 10 by accumulative Physics Bachelors?
Frequency Distribution?
* what the f is going on with the higher number of exiting physics masters (like U Washington)?TODO :> Filter by latest year reported +
frequency graph/histogram by year, earliest year reported, all NA (e.g.,
dead departments, no reports, new depts).
* inconsistent reporting, `NA` reporting by `Year`TODO :> What is the boundary/difference in Phys
bachelors production and undergrad enrollment figures between programs
that offer BS/MS as highest vs PhD as highest?
TODO :> Use shapefiles to demonstrate gradient of
universities and availability of programs
is HBCU
is historically womens’
is private/public
is profit/non-profit
is private christian
is land-grant
Faculty count?
Institution-level COMMON dataset
Voronoi-Cell approximation of nearest PhD or BS
Department Specialties? <- n-gram analysis
Undergraduate population total enrollments?
County of State that Institution is located in
Median Salary of surrounding
Percentile of Median Salary of county relative to State
Percentile median salary of county Relative to Country?
Gini Index.
Measure of Racial Segregation, Ethnic Diversity metrics?
Median Rent of surrounding
% vote in last election?
Political Party of State governor?
Online Program availability?
Has nearby or attached National Lab (as.factor(...)
with Y/N levels; sep col of assoc. lab)
Link to faculty/dept page
FAFSA ID/Federal School Code
R1 research designation
Legal recreational weed
no pitbull bans
per-capita incidence rate of population that are women between 23-40 with at least a bachelors degree
per-capita incidence rate of population that are men between 20-50 with at least a bachelors degree
estimate ratio of female/male annual median income
TODO :> Ratio of Accumulative PhDs produced over
timeframe to average number of graduate students enrolled per
year
TODO :> Interpolation across NA rows?
TODO :> Hertzsprung-Russell Diagram here https://observablehq.com/@d3/hertzsprung-russell-diagram?intent=fork
PhD resiliency has an open definition, should include the number
of produced MS students as well
Resiliency is a conversion score, but should compare number of
produced combined BS/MS students to sum of prev year’s
Senior and Junior declared major enrollments
Conversion Score 1; BS: \[ raw\_score = \Big(1 -
\frac{N\_Juniors_{t-1}}{N\_Seniors_t} \Big) + \epsilon_t \]
Conversion Score 2 should compare number of produced
BS to prev years Senior enrollments
Roughly defined econometric using distance between moving-average upper and lower bounds at sampled time \(t\).
XGBoost
Random Forest